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ABSTRACT 



This paper describes a model that integrates an item 
response theory (IRT) Rasch model and a hierarchical linear model and 
presents a method of estimating model parameter values that does not rely on 
large-sample theory and normal approximations. The model resulting from the 
integration of a hierarchical linear model and the Rasch model allows one to 
estimate all model parameters simultaneously and thus incorporate the 
standard errors of the latent trait estimates into the total variance of the 
model. The Rasch hierarchical measurement model (HMM) can allow one, for 
example, to model the variances of person-level and school-level error while 
estimating latent trait parameters of student ability estimates or student 
attitudes from student responses to a questionnaire of dichotomous items. Two 
different simulated data sets, both having a hierarchical structure, were 
created to illustrate the Rasch HMM. To illustrate how the Rasch HMM performs 
relative to a traditional two-step approach, the simulated balanced data set 
was reanalyzed. Results show that the Rasch HMM is very specialized because 
it is only appropriate for dichotomous responses and does not allow the 
incorporation of any level-1 or level-2 covariates. The usefulness of 
hierarchical measurement models hinges on the degree to which these models 
can be generalized. (Contains 4 figures, 6 tables, and 29 references.) (SLD) 
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A Rasch Hierarchical Measurement Model 

Both item response theory and hierarchical linear modeling are used in a variety of social science 
research applications. The use of item response theory (IRT) allows connections to be made between 
observed categorical responses provided by students and an underlying unobservable trait, such as ability 
or attitude (Hambleton & Swaminathan, 1985; Lord & Novick, 1968). Hierarchical linear modeling 
(HLM) allows the natural multilevel structure present in so much social science data to be represented 
formally in data analysis (Bryk & Raudenbush, 1992; Goldstein, 1987; Longford, 1993). In some cases, a 
researcher may wish to study the effects of covariates on the latent trait of interest. These covariates may 
include information about the respondents, as well as contextual information. This paper will present 
both a model that integrates an IRT and hierarchical linear model and a method of estimating model 
parameter values that does not rely on large-sample theory and Normal approximations. 

Item response theory models and hierarchical linear models can be combined to model the effect 
of multilevel covariates on a latent trait. We may wish to examine relationships between person ability 
estimates and person-level and contextual-level characteristics that may affect these ability estimates. 
Alternatively, we may wish to model data obtained from the same individuals across repeated 
questionnaire administrations. We may even wish to study the effect of person characteristics on ability 
estimates over time. 

In particular, the model resulting from the integration of a hierarchical linear model and a one- 
parameter logistic item response model will be presented in this paper. This model will be referred to as a 
Rasch hierarchical measurement model (HMM). The particular Rasch HMM developed in this study 
incorporates a Rasch model (Rasch, 1960) and a two-level hierarchical linear model having a random 
intercept at the first level, with no additional fixed or random covariates at either level. This form of a 
hierarchical linear model is known as a one-way analysis of variance with random effects. The Rasch 
model is appropriate for modeling dichotomous responses and models the probability of an individual's 
correct response on a dichotomous item. The logistic item characteristic curve, a function of ability, 
forms the boundary between the probability areas of answering an item incorrectly and answering the 
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item correctly. This one-parameter logistic model assumes that the discriminations of all items are 
assumed to be equal to one. 

The model resulting from the integration of a hierarchical linear model and a Rasch model allows 
one to estimate all model parameters simultaneously and therefore incorporate the standard errors of the 
latent trait estimates into the total variance of the model. In the Rasch HMM, the expected value of the 
latent trait parameter is replaced with a one-way ANOVA with random effects. The Rasch HMM can 
allow one, for example, to correctly model the variances of person-level and school-level error while 
estimating latent trait parameters of student ability estimates or student attitudes from student responses to 
a questionnaire of dichotomous items. 

Researchers have expanded traditional IRT models in a number of ways that are appropriate in a 
variety of applications. Person-level characteristics have been included in IRT models to help improve 
estimation of item difficulty parameters, or to model the effects of person characteristics upon the 
estimated latent trait measures (Mislevy, 1987; Patz & Junker, 1999a; Patz & Junker, 1999b). The IRT 
model has also been reformulated as a two-level model consisting of items nested within people in order 
to model measurement error among and between these two levels (Adams, Wilson, & Wu, 1997; Kamata, 
1998). Kamata (1998) takes this last example a step further by including a third contextual level, which is 
illustrated by Cheong & Raudenbush (2000). 

A variety of methods have been used to estimate the parameters of these expanded IRT models. 
A two-step approach has sometimes been used. Using this strategy, an ERT model is used to estimate 
latent trait parameters for each person, which are then with a hierarchical linear model. The standard 
errors of the latent trait estimates are not modeled in the second step, resulting in biased parameter 
estimates. The extent of this bias can be especially large when total sample size is small or when the 
hierarchical structure is sparsely populated. Others have utilized methods that rely on large-sample 
approximations or empirical Bayes approaches (Adams et al., 1997; Cheong & Raudenbush, 2000; 
Kamata, 1998; Mislevy, 1987; Zwinderman, 1991; Zwinderman, 1997). The use of these particular 
estimation methods, because they depend on Normal distribution theory, introduces constraints on the 
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rrdnimum allowable sample size or the degree to which the hierarchical structure can be sparsely 
populated. In addition, complex integrations are usually required within the context of the solution 
strategy. 

Bayesian methods, a third approach to estimating model parameters of an expanded IRT model, 
do not rely on Normal approximations. Bayesian methods allow an easier solution strategy that produces 
unbiased estimates and eliminates the need for directly computing complex integrations (Bayes, 1763; 
Gelman, Carlin, Stem, & Rubin, 1995). Values for the parameters of the Rasch hierarchical measurement 
model will be estimated using Bayesian data analysis methods. 

The Bayesian paradigm assumes the model parameters are random quantities having 
distributions. The distributions characterizing these unknown parameters are conditional on the observed 
data, which are assumed to be fixed. Bayesian inference supplements the likelihood equation with prior 
beliefs the analyst may have about the distributions of the parameters, via prior distributions. The 
likelihood and prior distributions are combined according to Bayes' theorem to produce the posterior 
distribution of the model parameters to be estimated. In contrast, Normal theory or the frequentist method 
postulates that the true values of the parameters are fixed and the data are random, and rely on large- 
sample approximations to produce estimates of model parameters. Empirical Bayesian methods make use 
of both paradigms. A subset of parameters are estimated and treated as fixed and known values in a 
subsequent Bayesian data analysis technique to estimate the remaining unknown parameters. Typically, 
estimates of the first subset of model parameters are obtained using frequentist methods that rely on 
approximations. 

Markov Chain Monte Carlo (MCMC) techniques are particular Bayesian data analysis methods 
that are utilized to estimate model parameters. In contrast to frequentist methods that produce a model 
parameter estimate and a standard error of the estimate, MCMC techniques can be used to produce the 
entire posterior distribution of the model parameter estimate. Gibbs sampling, a specific MCMC 
technique, is a method for generating random variables from a distribution by sampling from the 
collection of full conditional distributions of the complete posterior distribution (Gelfand et al., 1990). In 
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complex models such as the case of the Rasch hierarchical measurement model, a complicated posterior 
distribution can be represented as a collection of conditional probability distributions having standard 
distributional forms. A single sampled data point is drawn from the conditional probability distribution of 
each parameter, conditional on the values of the collection of remaining parameters and the data. The 
marginal probability distributions of the parameters can be constructed from the random draws after the 
Markov chain has converged. 

Bayesian data analysis methods were used to produce parameter estimates of the Rasch 
hierarchical measurement model. In particular, estimates of the parameters were found using Gibbs 
sampling. If the parameter does not have a conditional distribution of a common distributional form (the 
latent trait parameters and the item difficulty parameters), the Metropolis-Hastings algorithm was utilized 
to generate a random draw from the conditional distribution (Hastings, 1970; Metropolis et al., 1953). 
Patz & Junker (1999b) estimate parameters for a two-parameter logistic model using the combination of 
these particular Bayesian methods, and provide a detailed description of these MCMC methods within the 
context of IRT models. 

Construction of the Posterior & Full Conditional Distributions 

As a first step in Bayesian data analysis, the prior distributions for all model parameters must be 
specified in order to form a posterior density. For the latent trait parameter, it is sensible to assume that 
the latent trait of individual n , d n is drawn from a normal distribution (Lord & Novick, 1968) with 
unknown mean and variance, 

Po( d n)~ Normal (/j g , a g ) ? ( 1 ) 

Specific prior distributions for the hyperparameters of the latent trait distribution will be assigned later; 
for now, these prior distributions will be noted merely as p(ju e ) and p{(J e 2 ) . 

Typically, item difficulty parameters range between -4 and +4 standard deviations and can be 
modeled by a unimodal symmetric distribution (Baker, 1992). Consequently, a normal prior distribution 
will be assigned to the item difficulty parameters, 
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Pe (£ ) ~ Normal (/^ ,£7 f 2 ) . (2) 

Upon examining the Rasch model, it is clear that the model is unidentified when estimates for all latent 
trait and item difficutly parameters are unknown. This difficulty can be addressed by assuming that the 
mean of the item parameters is zero and the variance is one. This constraint can be directly incorporated 
into the prior distribution for the item difficulty parameters (Box & Tiao, 1973). 

The use of Gibbs sampling requires that all full conditional distributions of the model parameters 
be determined. Consider the case where students (level- 1) are nested within classrooms (level-2), and the 
outcome variable matrix consists of the dichotomous response strings students provide on an / item test. 

Given the Nxl matrix x for 7/ = ^*^ individuals answering I items, and assuming conditional 



independence among the responses, the likelihood of observing the response string x for N students 
nested within K classrooms is 



t(0,Z\x) = p{x\d,4) = 
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This is very similar to the likelihood equation for a one-parameter logistic item response model, with the 
addition of an extra indexing variable k. The unknown parameters in the likelihood include the latent trait 
variables 0 and the item difficulty parameters £ Student n’s latent trait parameter can be modeled with a 
one-way ANOVA with random effects. The latent trait parameter of group k is expected to have a value 
a k and a variance a ] . The random intercept a k at the student level is in turn modeled by a linear 
equation, and is expected to have a mean value of y 00 and a variance of T 00 , 



Ojk =<x k +e jk . 


(4) 


- Normal (0,<T^) , 
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- Normal (0,T oo ) . 
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The posterior distribution of the Rasch HMM is the product of the likelihood equation and the prior 
distributions of all unknown parameters, 

p(£,0,a,y oo ,a e 2 ,T oo ,c t, 2 |x)~ 

K «* 

)p(a k \r 00 ,T 00 )p(°e 2 )p(roo)p( T oo)p(< 7 s 2 )p( x \6’<!;)- ( 8 ) 

k = l j=I 

The hierarchical linear model is incorporated into the hierarchical measurement model via the 
prior distribution for the latent trait parameter, p(d | cc,<j] ). Based on the normal distribution specified 

earlier for the latent trait parameter, the prior distribution for the students’ latent trait parameters is 
conditional upon the level- 1 random intercept and the level- 1 error variance of the hierarchical linear 
model, 



H*l“>^nfb=Texp -^(e jk -a k ) 2 

k-\ j=\ 7T(J £ 



(9) 



The prior distribution for the level- 1 random intercept can be constructed by assuming that the 
level- 1 random intercept a k is normally distributed with a mean of the level-2 fixed intercept y 00 and a 



variance of the level-2 error variance T 00 , 
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Combining the prior distribution for the latent trait parameter (10) and the likelihood equation (3), the full 
conditional posterior for the latent trait parameter of an individual student is 

P {Pjk I ^ ^ > Q< jxk> ho > ?00> **•) 00 
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( 11 ) 




The full conditional posterior distribution for an individual student is conditional on the remaining 
students’ latent trait parameters, as indicated by the notation @ <J><k> in (1 1). Since (11) is not the kernel 

of any standard probability distribution, this posterior conditional distribution cannot be directly sampled 
from, necessitating an alternate strategy to generate random draws. 

The full conditional probability distributions of the level-1 random intercepts and the level-2 
fixed intercept can be expressed as products of the likelihood equation and normal prior distributions. 
Utilizing (10) as the prior distribution for the level- 1 random intercept a k , the full conditional probability 
distribution for this parameter is 



Since this is a case of normal data, with a normal prior distribution, the full conditional probability 
distribution can be reformulated as a normal distribution from which sampling is easy (Box & Tiao, 1973; 
Seltzer & Ang, 1999), 
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where 
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In the case of a one-way ANOVA with random effects, the expected value of the level- 1 random 
intercept is the level-2 fixed intercept y^. A uniform prior for the level-2 fixed intercept will be 



assumed. The full conditional probability distribution of this parameter as a function of the conditional 
distributions of the level-1 random intercepts is 



p(Y aa 1 0,£,a,a 2 £ ,T oo ,x) ~ p(a \ y 00 ,r 00 ) p(y 00 ) 



(17) 



oc exp - 




(18) 



As a consequence of the manipulation of (18), the full conditional probability distribution for the level-2 
fixed intercept y 00 is a normal distribution, 



with a mean of the average of all level- 1 random intercepts a , taken across all K level- 1 groups. 

Several prior distributions for the level- 1 error variance a] and the level-2 error variance r 00 will 
be considered here. In particular, both informative and noninformative prior distributions will be 
assumed. Utilizing the uniform distribution as a prior distribution provides the least amount of prior 
information possible. The use of this prior distribution suggests that any value for the estimate of the 
parameter is equally likely and yields a full conditional probability that is dependent only upon the 
likelihood equation. The inverse of the full conditional probability distributions for the level- 1 error 
variance a\ and the level-2 error variance T 00 , assuming uniform prior distributions, are kernels of 
gamma probability distributions, 



The conditional probability distributions for the variances can be rewritten as gamma probability 
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distributions, 
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When we wish to incorporate prior knowledge we may have about a particular parameter, the 
scaled inverse chi-square distribution as an informative prior distribution for the level- 1 or level-2 error 
variances. When this prior distribution is assumed for the level- 1 and level-2 error variances, the inverse 
of these conditional probability distributions are found to be kernels of a gamma distribution, 



-Xr ~ Gamma 

<j: 
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(24) 
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(25) 



The scaled inverse chi-square distribution can be scaled to reflect the increasingly informative prior 
information one may have about the error variances. As the parameter v becomes larger, this distribution 
becomes more concentrated at the mean, Sj (v - 2) . For values of v between 1 and 4, the variance of the 



scaled inverse chi-square distribution 2S 1 (v - 2) 2 (v - 4) is infinite, and this prior then becomes weak 



relative to the data. 

The full conditional distributions for the item difficulty parameters remain to be developed. 
Recall that the prior distribution for the item difficulty parameters is a standard normal distribution. 
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Consequently, the full conditional distribution for item i is similar, but slightly simpler in form to that of 
the latent trait 0 jk , 



As with the conditional distribution of the latent trait parameter, the full conditional distribution for the 
item difficulty parameters does not have a common distributional form. 

The full set of conditional probability distributions developed above forms the basis for the Rasch 
hierarchical measurement model. Aside from the latent trait and item difficulty parameters, the 
conditional probability distributions for the remaining model parameters are proportional to common 
distributions and thus easy to sample from directly. The conditional probability distributions for the 
latent trait and the item parameters cannot be directly sampled from, and the Metropolis-Hastings (M-H) 
algorithm will be employed to draw samples from these conditional probability distributions. 

Examples: Simulated Balanced & Unbalanced Data Sets 
Two different simulated data sets, both having a two-level hierarchical structure, were created to 
illustrate the Rasch hierarchical measurement model. The first data set is balanced and represents an ideal 
data situation, with each level-2 group containing an equal number of level-1 units. A practical example 
of this data set occurs when a researcher gathers the same number of repeated measurements on a sample 
of students. The second data set is a sparse data set that would occur when a small and unequal number 
of measurements are made on a sample of students. This data set illustrates a more realistic situation that 
a researcher may experience, and was used to evaluate the effectiveness of the model for challenging data 



p(6l&>A^a,W«o»*) oe 




exp ££**» 

_Jt=1 j=\ 



(26) 



situations. 



The structure of the first simulated data set replicated the structure of a data set from the Sloan 
Study of Youth & Social Development utilized in a study conducted by Maier (2000). This data set is 
referred to as the unbalanced data set and consists of N=142 response strings to 10 items. The level- 1 
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response string units are sparsely dispersed within level-2 groups with three-quarters of the level-2 groups 
containing two to three level-1 response string units, and the remaining level-2 groups containing between 
four and six level- 1 response string units. The second simulated data set, referred to as the balanced data 
set, has the same total number of response string units, but with a different grouping structure that 
consists ofK=53 level-2 groups, each with n k = 14 level-1 response string units. 

The response strings for each of the two data sets were generated in the following manner. First, 
values for the item difficulty parameters were generated from a standard normal distribution. Next, values 
of the latent trait parameters were generated using values of the level-2 intercept and level- 1 and level-2 
error variances based on results from descriptive and IRT analyses of the data set constructed for the 
Maier (2000) study. The actual values used for the data simulation were 0.2835 for the level- 1 error 
variance, 0.7099 for the level-2 error variance, and -0.0001 for the level-2 fixed intercept. Finally, the 
probability that a level- 1 unit would answer an item correctly was calculated using the Rasch IRT model 
and the generated latent trait and item difficulty parameter values. To prevent the model from fitting the 
data perfectly, overdispersion was built into the simulation of the response strings: a unit’s response for a 
particular test item was assigned a value of one if the calculated probability of a correct response 
exceeded a randomly generated uniform number. 

Implementation of Gibbs Sampling & the Metropolis-Hostings Algorithm 

Both simulated data sets were analyzed and the posterior distributions of the model parameters were 
produced using Gibbs sampling. The Metropolis-Hastings algorithm was used to draw samples from the 
conditional distributions of the latent trait and item difficulty parameters. For both the balanced and 
unbalanced data sets, two analyses were completed to produce a total of four complete analyses: one 
analysis assumed uniform prior distributions for the level-1 and level-2 variances while the other assumed 
scaled inverse chi-square prior distributions for the error variances. In particular, a scaled inverse chi- 
square prior having v = 10 degrees of freedom and a mean 5 = 2.268 was assumed for the level-1 
variance <j ] . The scaled inverse chi-square prior assumed for the level-2 error variance T 00 had the same 
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degrees of freedom, but a mean of S' = 5.689. Swaminathan & Gifford (1982) suggested choosing 
5 < v < 1 5 when utilizing this prior distribution with a Rasch IRT model. 

The candidate-generating density q{x,y) of the Metropolis-Hastings algorithm used to simulate 
the latent trait and item difficulty parameters was chosen to be a normal distribution having a mean of the 
current state of the chain x and a standard deviation c n . The form of this candidate -generating density 
produces the random-walk Metropolis-Hastings algorithm. Since the candidate-generating density is 
symmetric [q(z) = q(-z )) , the probability of the chain moving from the current value x to the proposed 
value y reduces to 

a{x,y) = minj^^.lj . (27) 

The standard deviation c n of the candidate-generating density was fixed to achieve an acceptance 
proportion of roughly 0.5 (Gelman, Roberts, & Gilks, 1996; Patz & Junker, 1999b). For the balanced 
data set, in the case of uniform priors for the level- 1 and level-2 error variances, the acceptance 
proportion was 0.5377 for the item difficulty parameters specifying a standard deviation c„=0.15 and 
0.4500 for the latent trait parameters specifying c„=1.0. Assuming scaled inverse chi-square priors for the 
level- 1 and level-2 error variances and using the same values of c n , the proportion of acceptance was 
0.5388 for the item difficulty parameters and 0.4441 for the latent trait parameters. For the unbalanced 
data set assuming uniform priors for the error variances, the acceptance proportion was 0.5418 for the 
item difficulty parameters and 0.4871 for the latent trait parameters. The acceptance proportions utilizing 
the same data set and assuming scaled inverse chi-square priors for the error variances were 0.5410 for 
the item difficulty parameters and 0.4785 for the latent trait parameters. 

The values of the Markov chains of each model parameter were used to generate the 
corresponding marginal distribution for each the parameter estimates. The starting values used for the 
analyses appear in Table 1. The initial value used for the latent trait parameter of each level-1 unit was 
simply the raw score averaged across test items. For all analyses, 30,000 iterations of the algorithm were 
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run. The first 1,000 iterations were considered to be the bum-in iterations and these corresponding 
deviates were discarded. The resulting 29,000 iterations formed the basis for parameter estimation. 

Insert Table 1 Here 

Results of Analysis 

The results of the analyses appear in Tables 2-5 for the balanced and unbalanced data sets. The 
true values of each of the model parameters are listed in the second column of the table. This value can 
be compared to the mean of the deviates over 29,000 iterations. The variance of the posterior distribution 
is also calculated. The time-series standard error of the estimate of the mean can be used as an estimate 
of the Monte Carlo error. The final column of the table specifies the 95% credibility interval for the 
deviates. 

Insert Table 2 Here 
Insert Table 3 Here 
Insert Table 4 Here 
Insert Table 5 Here 

Examining the results for the item difficulty parameters first, the agreement between the mean of 
the posterior distribution of the estimate and the true value for the parameter is quite good. For both data 
sets, the true value lies within the 95% credibility interval for all but one of the item difficulty parameters. 
The true value of Item 9 lies just outside the 95% credibility interval of the estimate, but within the 97.5% 
credibility interval. The standard error of the estimate of the mean of the item difficulty parameter 
estimates range from a high value of 0.00142 to a low value of 0.00088, estimated by dividing the square 
root of the spectral density estimate by the sample size. These statistics were calculated using CODA 
software (Best, Cowles, & Vines, 1995). 

The particular number of bum-in iterations was chosen based on examination of autocorrelation 
values and time series plots. Examination of these plots and statistics showed that most all of the Markov 
chains exhibited common behavior that was indicative of a rapidly mixing Markov chain. The notable 
exception is the level- 1 error variance cj/, which will be addressed separately below. Aside from this 
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particular parameter, the time series plots of the remaining parameter estimates show acceptable mixing 
patterns. Figure 1 shows time-series plots of the Markov chain for Item 2, unbalanced data set, assuming 
uniform priors for the level- 1 and level-2 error variances and Figure 2 shows the corresponding plot for 
the balanced data set assuming scaled inverse Chi-square priors. Figures 3-6 show the corresponding plots 
for the level-2 and the level- 1 error variances. The first four figures provide good examples of the type of 
rapid mixing that occurred with the Markov chains of most of the remaining parameter estimates. 
However, the time-series plots for the level- 1 error variance show a lower rate of mixing, perhaps 
indicating that the Markov chain may not have converged. 

Insert Figure 1 Here 
Insert Figure 2 Here 
Insert Figure 3 Here 
Insert Figure 4 Here 
Insert Figure 5 Here 
Insert Figure 6 Here 

The autocorrelation values of the Markov chains for most of the parameters rapidly approach zero 
as the lag increases. Table 6 shows autocorrelation values corresponding to lags of 1, 5, 10, and 50 for 
the Markov chains of all parameter estimates of the balanced data set. The values of autocorrelation for 
the unbalanced data set show the same pattern. As indicated in this table, the autocorrelation values 
rapidly approach zero for most all of the parameter estimates, a property that indicates rapid Markov 
chain mixing. As with the time-series plots, the notable exception to this behavior is the level- 1 error 
variance cr/. 

Insert Table 6 Here 
Exploration of Level- 1 Error Variance 

Additional analyses were completed to further examine convergence and mixing rates of the 
Markov chains for the level- 1 error variance and the Gelman & Rubin (1992) convergence diagnostic was 
calculated for corresponding Markov chain. Since this diagnostic requires multiple Markov chains, three 
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separate sets of Markov chains were run for each of the two simulated data sets, assuming three different 
starting values for the level-1 error variance. These starting values were 1.0, 3.0, and 5.0, while the 
starting values for the other parameters were retained from the first analysis. The Gelman & Rubin 
diagnostic was calculated for the three chains of the level- 1 error variance using CODA software. All 
three Markov chains for the balanced and unbalanced data sets met the Gelman & Rubin criteria for 
convergence, suggesting that the Markov chains of the level- 1 error variance converged to a stationary 
distribution. 

Autocorrelation values of this additional set of Markov chains were examined to assess the rate of 
mixing. These values were comparable to that of the original Markov chains of all the model parameters. 
These findings suggest that applying a thinning interval to the Markov chain for the level- 1 error variance 
may be an appropriate strategy to improve mixing rate. Autocorrelation values for Markov chains with 
different thinning intervals were examined and a thinning interval of 3 was identified as the best option 
because it considerably reduced the autocorrelation without increasing the Monte Carlo variance 
substantially. 

Overall, the additional set of Markov chains for the level- 1 error variance behaved similarly to the 
chains originally simulated. As with the original Markov chains, the 95% credibility intervals (averaged 
from the three Markov chains) contain the true value of the level- 1 error variance. In most cases, the 
posterior distributions are not centered on the true value of the parameter. For both data sets, the mean of 
the posterior distribution for the level- 1 error variance in the unbalanced data set slightly overestimates 
the true value, while the posterior means for the level-2 error variance slightly overestimates the true 
value of the parameter. 

Both the original Markov chain and the additional set of Markov chains demonstrate similar 
behavior for the error variance estimates. Clearly this behavior was not a statistical artifact present only 
in the original Markov chains. It was decided to investigate whether this behavior was related to the true 
values of the error variance parameters, especially in the case of the level- 1 variance, which is fairly close 
to zero. New balanced and unbalanced Rasch HMM data sets were simulated using a value of one for the 
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level- 1 variance o 2 . Model parameters were estimated using the same MCMC algorithm as used for the 
original data sets. The first 1000 iterations were discarded as the bum-in, and the Markov chains mixed 
adequately, as indicated by examination of the time-series plots. Again, the 95% credibility intervals 
contain the true values of the parameters; again, the posterior distributions are not centered on the true 
value of the parameters. 

Since the results for the new data sets are similar to that of the original data sets, this pattern does 
not seem to be related to the true value of the level- 1 variance. However, the pattern could be 
conceivably linked to the process used to simulate the data sets. As mentioned previously, overdispersion 
was built into the model by comparing the probability of a correct response to a randomly generated 
uniform deviate. This procedure may very well account for the discrepancies between posterior means 
and true values of the level- 1 and level-2 variances. And, although the Markov chains of the level- 1 error 
variance seem to exhibit a lower rate of mixing, the chains meet the criterion of a variety of convergence 
diagnostics indicating stationarity had been reached. 

Comparison to a Two-Step Approach 

To illustrate how the Rasch hierarchical measurement model performs relative to a traditional 
two-step approach, the simulated balanced data set was reanalyzed. First, estimates of the latent trait 
parameter for each of the N=742 response strings were produced according to a Rasch item response 
model, using the BIGSTEPS program (Wright & Linacre, 1993). The true values of the item difficulty 
parameters were given for this step, so as to make equating unnecessary. The resulting latent trait 
parameter estimates were then used as the outcome variable for a two-level hierarchical linear model that 
utilized the same hierarchical structure as the simulated balanced data set. The hierarchical coefficients 
were estimated using the HLM program (Bryk, Raudenbush, & Congdon, 1996). The results of this 
analysis appear in Table 7. For this particular data set, the two-step analysis approach grossly 
overestimates the level- 1 random error variance and underestimates the level-2 random error variance 
while correctly estimates the level-2 fixed intercept. Clearly in this case, the Rasch HMM models the data 
much better than the two-step strategy. 
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Insert Table 7 Here 
Implementation and Future Research 

To obtain estimates of the hierarchical measurement model parameters, the Gibbs sampling 
algorithms were implemented in a computer program written in Visual C++. The object-oriented 
capabilities of C++ make this language a natural fit for the nested multi-parameter structure of the 
hierarchical measurement model. To produce estimates for Rasch hierarchical measurement model 18 
and 24 minutes were required to run 30,000 iterations of the Gibbs sampling algorithm on a CPU with a 
450 MHz processor and 192 MB of memory. CODA software (Best et al., 1995) was used as a post- 
Gibbs analysis tool. This software was used to calculate estimates of the mean, standard error of the 
mean, and the variance of the posterior distributions of the model parameters, as well as to generate time- 
series and autocorrelation values. 

Implementation time. 

The Rasch hierarchical measurement model is very specialized because it appropriate for 
dichotomous responses only and does not allow incorporation of any level- 1 or level-2 covariates. The 
usefulness of hierarchical measurement models hinges on the degree to which these models can be 
generalized. Generalization can occur along at least three avenues. Different IRT models can be 
incorporated into the model. Work is currently being done that integrates a Partial Credit IRT model with 
a 2-level hierarchical linear model, resulting in a Partial Credit HMM. Another way to expand the 
hierarchical measurement model is to consider an alternative distribution for the level- 1 random intercept 
or the level-2 error variances. A hierarchical measurement model is currently being investigated that 
utilizes a /-distribution for the latent trait parameters. This model would allow outlier level-1 groups to be 
modeled appropriately. Additionally, more complex item response models and hierarchical linear models 
will also be considered. 
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Table 1 : Starting Model Pa rameter Values used for Analyses of Rasch HMM D ata Sets 



Model Parameter Starting Value 



Level- 1 Variance <J 2 £ 


0.35 


Level- 1 Intercept, a k 


0.50 


Level-2 Variance T 00 


1.00 


Level-2 Intercept 


0.50 


Latent Trait, 0 jk 


Average raw score 


Item Parameters, ^ 


0.00 
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Table 2: Item Difficulty Parameter Estimates under Uniform and Scaled Inverse Chi-square Prior 

Distributions for Level- 1 and Level-2 Error Variances, Rasch HMM Balanced Data Set 



Model 

Parameter 


True Value 


Mean 


Time-series 
SE of Mean 


Variance 


95% Credibility 
Interval 


Uniform 


-1.95903 


-1.89 


0.00142 


0.00912 


(-2.090,-1.710) 


Inverse X 2 




-1.89 


0.00136 


0.00922 


(-2.080,-1.700) 


& 

Uniform 


0.75865 


0.791 


0.00101 


0.00674 


(0.632,0.953) 


Inverse X 2 




0.788 


0.00106 


0.00658 


(0.627, 0.945) 


Uniform 


1.22899 


1.32 


0.00117 


0.00766 


(1.150, 1.490) 


Inverse X 2 




1.32 


0.00114 


0.00803 


(1.140, 1.490) 


£3 

Uniform 


0.46361 


0.394 


0.00094 


0.00618 


(0.241,0.549) 


Inverse X 2 




0.392 


0.00096 


0.00612 


(0.238,0.547) 


Uniform 


-0.61123 


-0.548 


0.00100 


0.00605 


(-0.703, -0.395) 


Inverse X 2 




-0.548 


0.00096 


0.00594 


(-0.700, -0.396) 


£5 

Uniform 


-1.07601 


-1.05 


0.00102 


0.00679 


(-1.210, -0.887) 


Inverse X 2 




-1.04 


0.00106 


0.00666 


(-1.210, -0.885) 


Uniform 


-0.09302 


-0.177 


0.00091 


0.00593 


(-0.326, -0.025) 


Inverse X 2 




-0.177 


0.00090 


0.00594 


(-0.329, -0.026) 


£7 

Uniform 


-0.26429 


-0.198 


0.00088 


0.00573 


(-0.346, -0.050) 


Inverse X 2 




-0.196 


0.00092 


0.00585 


(-0.345, -0.044) 


Uniform 


0.62427 


0.595 


0.00093 


0.00612 


(0.441,0.749) 


Inverse X 2 




0.591 


0.00096 


0.00616 


(0.437, 0.744) 


^9 

Uniform 


0.92816 


0.766 


0.00102 


0.00654 


(0.609,0.925) 


Inverse X 2 




0.766 


. 0.00102 


0.00645 


(0.611,0.924) 
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Table 3: Posterior Distribution of Item Difficulty Parameters under Uniform and Scaled Inverse Chi- 



square Prior Distributions for Level- 1 and Level-2 Error Variances, Rasch HMM 
Unbalanced Data Set 



Model 

Parameter 


True Value 


Mean 


Time-series 
SE of Mean 


Variance 


95% Credibility 
Interval 


So 

Uniform 


-1.95903 


-1.96 


0.00155 


0.0104 


(-2.160,-1.760) 


Inverse X~ 




-1.95 


0.00143 


0.0104 


(-2.150,-1.750) 


s. 

Uniform 


0.75865 


0.771 


0.00104 


0.00638 


(0.614,0.929) 


Inverse X 2 




0.769 


0.00103 


0.00637 


(0.613,0.925) 


S2 

Uniform 


1.22899 


1.25 


0.00109 


0.00733 


(1.080, 1.420) 


Inverse X 2 




1.24 


0.00110 


0.00694 


(1.080, 1.410) 


£.3 

Uniform 


0.46361 


0.470 


0.00095 


0.00613 


(0.315,0.622) 


Inverse X 2 




0.465 


0.00092 


0.00601 


(0.312,0.617) 


Uniform 


-0.61123 


-0.725 


0.00099 


0.00653 


(-0.887, -0.566) 


Inverse X 2 




-0.721 


0.00096 


0.00643 


(-0.879, -0.564) 


£.5 

Uniform 


-1.07601 


-1.04 


0.00108 


0.00714 


(-1.210, -0.879) 


Inverse X 2 




-1.04 


0.00115 


0.00726 


, (-1.210,-0.869) 


Uniform 


-0.09302 


-0.108 


0.00095 


0.00604 


(-0.263, 0.045) 


Inverse X~ 




-0.109 


0.00096 


0.00601 


(-0.261, 0.042) 


^7 

Uniform 


-0.26429 


-0.262 


0.00098 


0.00607 


(-0.414, -0.108) 


Inverse X 2 




-0.260 


0.00097 


0.00613 


(-0.414, -0.105) 


^8 

Uniform 


0.62427 


0.620 


0.00094 


0.00615 


(0.467, 0.774) 


Inverse X 2 




0.619 


0.00091 


0.00594 


(0.466, 0.770) 


Uniform 


0.92816 


0.985 


0.00102 


0.00681 


(0.821, 1.150) 


Inverse X 2 




0.980 


0.00105 


0.00664 


(0.822, 1.140) 
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Table 4: Hierarchical Parameter Estimates under Uniform and Scaled Inverse Chi-square Prior 

Distributions for Level- 1 and Level-2 Error Variances, Rasch HMM Balanced Data Set 



Model 

Parameter 


True Value 


Mean 


Time-series 
SE of Mean 


Variance 


95% Credibility 
Interval 


Too 

Uniform 


-0.000125 


-0.0893 


0.00087 


0.01232 


(-0.306,0.129) 


Inverse X‘ 




-0.0888 


0.00089 


0.01210 


(-0.303,0.126) 


Uniform 


0.283486 


0.282 


0.00144 


0.00225 


(0.196, 0.380) 


Inverse X 2 




0.266 


0.00132 


0.00187 


(0.188, 0.355) 


Too 

Uniform 


0.709858 


0.600 


0.00146 


0.0196 


(0.380, 0.920) 


Inverse X~ 




0.572 


0.00116 


0.0137 


(0.383, 0.840) 



Table 5: Posterior Distribution of Hierarchical Parameters under Uniform and Scaled Inverse Chi-square 
Prior Distributions for Level- 1 and Level-2 Error Variances, Rasch HMM Unbalanced Data 
Set 



Model 

Parameter 


True Value 


Mean 


Time -series 
SE of Mean 


Variance 


95% Credibility 
Interval 


Too 

Uniform 


-0.000125 


-0.0722 


0.00076 


0.00325 


(-0.040, 0.183) 


Inverse X‘ 




-0.0722 


0.00078 


0.00312 


(-0.038, 0.182) 


a E 2 

Uniform 


0.283486 


0.409 


0.00204 


0.00493 


(0.282, 0.556) 


Inverse X 




0.373 


0.00187 


0.00412 


(0.255, 0.508) 


Too 

Uniform 


0.709858 


0.580 


0.00180 


0.0078 


(0.420, 0.769) 


Inverse X‘ 




0.576 


0.00157 


0.0064 


(0.431,0.745) 
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Table 6: Autocorrelation Values as a Function of Lag, Rasch HMM Balanced Data Set 



Model 


Level-1 & Level-2 




Autocorrelation 




Parameter 


Priors 


Lag 1 


Lag 5 


Lag 10 


Lag 50 




Uniform 


0.69300 


0.19200 


0.06220 


0.00811 


Inverse X 2 


0.68900 


0.19200 


0.05730 


-0.01010 




Uniform 


0.66400 


0.13900 


0.02050 


0.01200 


Inverse X 2 


0.65900 


0.13900 


0.03940 


0.00181 


^2 


Uniform 


0.67000 


0.15200 


0.02890 


0.00624 




Inverse X 2 


0.67400 


0.15100 


0.03710 


0.00417 


^3 


Uniform 


0.65300 


0.11100 


0.01990 


0.00976 




Inverse X 2 


0.65200 


0.13700 


0.01700 


-0.00305 




Uniform 


0.65200 


0.12900 


0.02600 


-0.00388 




Inverse X 2 


0.65200 


0.12800 


0.01580 


0.00237 




Uniform 


0.66500 


0.15900 


0.02660 


-0.01520 


Inverse X 2 


0.66700 


0.14700 


0.02940 


-0.00038 




Uniform 


0.65500 


0.13600 


0.01790 


0.00751 


Inverse X 2 


0.64800 


0.12600 


0.01110 


-0.00109 


£.7 


Uniform 


0.63700 


0.11400 


0.01240 


-0.00089 


Inverse X 2 


0.64400 


0.12500 


0.03240 


0.00175 




Uniform 


0.65200 


0.13700 


0.01900 


-0.00673 


Inverse X 2 


0.64600 


0.13900 


0.02920 


-0.00078 




Uniform 


0.66500 


0.14200 


0.02180 


0.00538 


Inverse X 2 


0.65900 


0.14400 


0.04990 


0.00386 


Yoo 


Uniform 


0.09960 


0.03880 


0.02070 


0.00588 


Inverse X 2 


0.09790 


0.04480 


0.03570 


-0.00230 




Uniform 


0.90100 


0.76500 


0.62600 


0.12000 




Inverse X 2 


0.89600 


0.75800 


0.61400 


0.15000 


Too 


Uniform 


0.18600 


0.08570 


0.05030 


0.01060 


Inverse X 2 


0.16600 


0.08800 


0.06140 


0.01250 



Table 7: Estimates o f Hierarchical Parameters from Two-Step Analysis, Rasch HMM Balanced Data Set 



Model 

Parameter 


Coefficient SE 


T-ratio 


Yoo 


-0.109593 0.113543 

ct e 2 =0.951460 (5£=0.78255) 
t 00 =0.612390 (STM). 97543) 


-0.965 
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Figure 1 : Time-series Plot of Item 2, Rasch HMM Unbalanced Data Set, Uniform Priors for 
Level- 1 and Level-2 Error Variances 
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Figure 2: Time-series Plot of Item 2, Rasch HMM Balanced Data Set, Scaled Inverse Chi-square 
Priors for Level- 1 and Level-2 Error Variances 
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Figure 3: Time-series Plot of t 0 o, Rasch HMM Unbalanced Data Set, Uniform Priors for Level-1 
and Level-2 Error Variances 
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Figure 4: Time-series Plot of x 0 o, Rasch HMM Balanced Data Set, Scaled Inverse Chi-square 
Priors for Level- 1 and Level-2 Error Variances 
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Figure 5: Time-series Plot of a e ', Rasch HMM Unbalanced Data Set, Uniform Priors for Level- 1 and 
Level-2 Error Variances 
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Figure 6: Time-series Plot of a £ 2 , Rasch HMM Balanced Data Set, Scaled Inverse Chi-square Priors for 
Level- 1 and Level-2 Error Variances 
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